Word Segmentation for Document Images by Successively Merging Adjacent Character Bounding Boxes by Iterative Dilation

نویسنده

  • Dayananda Sagar
چکیده

A new method of word segmentation for document images is presented. The method uses the bounding box regions to enclose the letters (characters) of the words and then the resulting letter spaces are progressively filled to merge the character bounding boxes to get the word bounding boxes. The method holds good for inclined and irregularly distributed words. The proposed method completely avoids the line segmentation process which normally precedes word segmentation in traditional methods. Keywords— Bounding boxes, Connected components, Horizontal Dilation, Character spacing, Word bounding boxes, Word segmentation, Word spacing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of text lines and text blocks on document images based on statistical modeling

In this article, we developed a Bayesian model to characterize text line and text block structures on document images using the text word bounding boxes. We posed the extraction problem as finding the text lines and text blocks that maximize the Bayesian probability of the text lines and text blocks given the text word bounding boxes. In particular, we derived the so-called probabilistic linear...

متن کامل

A Font and Size Independent Content Based Retrieval System for Kannada Document Images

This paper presents a Content based image retrieval system for Kannada Document images. Given a query word, the system returns the documents in the database in which there is a similar word, with the word highlighted. The retrieval works for Kannada document images which have different font sizes and styles. First the scanned Kannada document images are preprocessed to reduce image noise. Then ...

متن کامل

Chip Refinement Character Recognition Text Clean - up I 2 Segmentation Texture Segmentation Texture Segmentation Texture Segmentation Texture Generation

There are many applications in which the automatic detection and recognition of text embedded in images is useful. These applications include multimedia systems, digital libraries, and Geographical Information Systems. When machine generated text is printed against clean backgrounds, it can be converted to a computer readble form (ASCII) using current Optical Character Recognition (OCR) technol...

متن کامل

Word Spotting in Chinese Document Images without Layout Analysis

An approach to searching user-specified words/phrases in Chinese document images, without the requirements of layout analysis, is proposed in this paper. Bounding boxes of Chinese character images are first determined using connected component analysis. Next, a suitable character from the user-specified word/phrase is chosen as the initial character to search for a matching candidate in the doc...

متن کامل

Persian Printed Document Analysis and Page Segmentation

This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012